arXiv:1506.05032v5 [cs.CV] 29 Mar 2016 


IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 35, NO. 3, MARCH 2016 


Histopathological Image Classification using 
Discriminative Feature-oriented Dictionary Learning 

Tiep Huu Vu^ Student Member, IEEE, Hojjat Seyed Mousavi^, Student Member, IEEE, 

Vishal Monga^ Senior Member, IEEE, Ganesh Rao* and UK Arvind Rao* 


Abstract — In histopathological image analysis, feature extrac¬ 
tion for classification is a challenging task due to the diversity 
of histology features suitable for each problem as well as 
presence of rich geometrical structures. In this paper, we propose 
an automatic feature discovery framework via learning class- 
specific dictionaries and present a low-complexity method for 
classification and disease grading in histopathology. Essentially, 
our Discriminative Feature-oriented Dictionary Learning (DFDL) 
method learns class-specific dictionaries such that under a spar¬ 
sity constraint, the learned dictionaries allow representing a new 
image sample parsimoniously via the dictionary corresponding to 
the class identity of the sample. At the same time, the dictionary 
is designed to be poorly capable of representing samples from 
other classes. Experiments on three challenging real-world image 
databases: 1) histopathological images of intraductal breast 
lesions, 2) mammalian kidney, lung and spleen images provided 
by the Animal Diagnostics Lab (ADL) at Pennsylvania State 
University, and 3) brain tumor images from The Cancer Genome 
Atlas (TCGA) database, reveal the merits of our proposal 
over state-of-the-art alternatives. Moreover, we demonstrate that 
DFDL exhibits a more graceful decay in classification accuracy 
against the number of training images which is highly desirable 
in practice where generous training is often not available. 

Index terms —Histopathological image classification. Sparse 
coding. Dictionary learning. Feature extraction. Cancer grading. 

1. Introduction 

Automated histopathological image analysis has recently 
become a significant research problem in medical imaging and 
there is an increasing need for developing quantitative image 
analysis methods as a complement to the effort of pathologists 
in diagnosis process. Consequently, an emerging class of 
problems in medical imaging focuses on the the development 
of computerized frameworks to classify histopathological im¬ 
ages Q-0- These advanced image analysis methods have 
been developed with three main purposes of (i) relieving the 
workload on pathologists by sieving out obviously diseased 
and also healthy cases, which allows specialists to spend more 
time on more sophisticated cases; (ii) reducing inter-expert 
variability; and (iii) understanding the underlying reasons for 
a specific diagnosis that pathologists might not realize. 

In the diagnosis process, pathologists often look for 
problem-specific visual cues, or features, in histopathological 
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images in order to categorize a tissue image as one of the 
possible categories. These features might come from the distin¬ 
guishable characteristics of cells or nuclei, for example, size, 
shape or texture Q. 0- They could also come from spatially 
related structures of cells 0, 0, 0, 0. In some cancer 
grading problems, features might include the presence of 
particular regions 0 0 Consequently, different customized 
feature extraction techniques for a variety of problems have 
been developed based on these observed features pQ|- OD- 
Morphological image features have been utilized in medical 
image segmentation (0 for detection of vessel-like patterns. 
Wavelet features and histograms are also a popular choice of 
features for medical imaging |T^ , GD- Graph-based features 
such as Delaunay triangulation, Vonoroi diagram, minimum 
spanning tree j^, query graphs |T^ have been also used to 
exploit spatial structures. Orlov et al. |T0| , 0 have proposed 
a multi-purpose framework that collects texture information, 
image statistics and transforms domain coefficients to be set 
of features. For classification purposes, these features are 
combined with powerful classifiers such as neural networks 
or support vector machines (SVMs). Gurcan et al pro¬ 
vided detailed discussion of feature and classifier selection for 
histopathological analysis. 

Sparse representation frameworks have also been proposed 
for medical applications recently 0, 0> p9| . Specifically, 
Srinivas et al 0>0 presented a multi-channel histopatholog¬ 
ical image as a sparse linear combination of training examples 
under channel-wise constraints and proposed a residual-based 
classification technique. Yu et al p0| proposed a method for 
cervigram segmentation based on sparsity and group clustering 
priors. Song et al (D 12^ proposed a locality-constrained 
and a large-margin representation method for medical image 
classification. In addition, Parvin et al Q combined a dictio¬ 
nary learning framework with an autoencoder to learn sparse 
features for classification. Chang et al extended this 
work by adding a spatial pyramid matching to enhance the 
performance. 

A. Challenges and Motivation 

While histopathological analysis shares some traits with 
other image classification problems, there are also principally 
distinct challenges specific to histopathology. The central 
challenge comes from the geometric richness of tissue images, 
resulting in the difficulty of obtaining reliable discriminative 
features for classification. Tissues from different organs have 
structural and morphological diversity which often leads to 
highly customized feature extraction solutions for each prob¬ 
lem and hence the techniques lack broad applicability. 
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Our work aims to produce a more versatile histopatho- 
logical image classification system through the design of dis¬ 
criminative, class-specific dictionaries which is hence capable 
of automatic feature discovery using example training image 
samples. Our proposal evolves from the sparse representation- 
based classifier (SRC) p4| which has received significant 
attention recently p5|-||27||. Wright et al p4| proposed SRC 
with the assumption that given a sufficient collection of train¬ 
ing samples from one class, which is referred as a dictionary, 
any other test sample from the same class can be roughly 
expressed as a linear combination of these training samples. 
As a result, any test sample has a sparse representation in 
terms of a big dictionary comprising of class-specific sub¬ 
dictionaries. Recent work has shown that learned and data 
adaptive dictionaries significantly outperform ones constructed 
by simply stacking training samples together as in p4| . In 
particular, methods with class-specific constraints l-ED 
are known to further enhance classification performance. 

Being mindful of the aforementioned challenges, we de¬ 
sign via optimization, a discriminative dictionary for each 
class by imposing sparsity constraints that minimizes intra¬ 
class differences, while simultaneously emphasizing inter¬ 
class differences. On one hand, small intra-class differences 
encourage the comprehensibility of the set of learned bases, 
which has ability of representing in-class samples with only 
few bases (intra class sparsity). This encouragement forces 
the model to find the representative bases in that class. On 
the other hand, large inter-class differences prevent bases of a 
class from sparsely representing samples from other classes. 
Concretely, given a dictionary from a particular class D with 
k bases and a certain sparsity level L k, we define an L- 
subspace of D as a span of a subset of L bases from D. Our 
proposed Discriminative Feature-oriented Dictionary Learning 
(DFDL) aims to build dictionaries with this key property: any 
sample from a class is reasonably close to an L-subspace of 
the associated dictionary while a complementary sample is 
far from any L-subspace of that dictionary. Illustration of the 
proposed idea is shown in Fig. 

B. Contributions 

The main contributions of this paper are as follows: 

1) A new discriminative dictionary learning metho^ 
for automatic feature discovery in histopathological images 
is presented to mitigate the generally difficult problem of 
feature extraction in histopathological images. Our discrim¬ 
inative framework learns dictionaries that emphasize inter¬ 
class differences while keeping intra-class differences small, 
resulting in enhanced classification performance. The design is 
based on solving a sparsity constrained optimization problem, 
for which we develop a tractable algorithmic solution. 

2) Broad Experimental Validation and Insights. Exper¬ 
imental validation of DFDL is carried out on three diverse 
histopathological datasets to show its broad applicability. The 
first dataset is courtesy of the Clarian Pathology Lab and 
Computer and Information Science Dept., Indiana University- 

^ The preliminary version of this work was presented at IEEE International 
Symposium on Biomedical Imaging, 2015 j^. 
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Figure 1: Main idea: a) The sparse representation space of learned 
dictionary using in-class samples only, e.g. KSVD or ODL 
P^(Fl,£i (Djn-class) ^i^y ^Lo cover some complementary samples), 
and b) desired DFDL ( 14,82 (Bdfdl) cover in-class samples only). 

Purdue University Indianapolis (lUPUI). The images acquired 
by the process described in 0 correspond to human In¬ 
traductal Breast Lesions (IBL). Two well-defined categories 
will be classified: Usual Ductal Hyperplasia (UDH)-benign, 
and Ductal Carcinoma In Situ (DCIS)-actionable. The second 
dataset contains images of brain cancer (glioblastoma or 
GBM) obtaind from The Cancer Genome Atlas (TCGA) 
provided by the National Institute of Health, and will hence¬ 
forth be referred as the TCGA dataset. For this dataset, we 
address the problem of detecting Micro Vascular Proliferation 
(MVP) regions, which is an important indicator of a high grade 
glioma (HGG) 0 The third dataset is provided by the Animal 
Diagnostics Lab (ADL), The Pennsylvania State University. 
It contains tissue images from three mammalian organs - 
kidney, lung and spleen. For each organ, images will be 
assigned into one of two categories-healthy or infiammatory. 
The samples of these three datasets are given in Figs.[^[^ and 
1^ respectively. Extensive experimental results show that our 
method outperforms many competing methods, particularly 
in low training scenarios. In addition. Receiver Operating 
Characteristic (ROC) curves are provided that facilitate a trade¬ 
off between false alarm and miss rates. 

3) Complexity analysis. We derive the computational 
complexity of DFDL as well as competing dictionary learning 
methods in terms of approximate number of operations needed. 
We also report experimental running time of DFDL and three 
other dictionary learning methods. 

4) Reproducibility. All results in the manuscript are repro¬ 
ducible via a user-friendly softwar^ The software (MATLAB 
toolbox) is also provided with the hope of usage in future 
research and comparisons via peer researchers. 

The remainder of this paper is organized as follows. Our 
proposed DFDL via a sparsity constrained optimization and 
the solution for the said optimization problem are detailed 
in Section [I^ Section |II-D| also presents our algorithmic 
classification procedures for the three diverse histopathological 
problems stated above. Section [^presents classification accu¬ 
racy as well as run-time complexity comparisons with existing 
methods in the literature to reveal merits of the proposed 

^The software can be downloaded at http://signaLee.psu.edu/dfdLhtniI 
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Figure 2: Samples form IBL dataset: left-UDH, right-DCIS 



Figure 3: Samples form TCGA dataset. Left: regions without MVP. 
Right: regions with MVP are inside blue ovals. 


DFDL. A detailed analytical comparison of complexity against 
competing dictionary learning methods is provided in the 
Appendix. Section |IV| concludes the paper. 

II. Contributions 

A. Notation 

The vectorization of a small block (or patch|^ extracted 
from an image is denoted as a column vector y which 
will be referred as a sample. In a classification problem where 
we have c different categories, collection of all data samples 
from class i (i can vary between 1 to c) forms the matrix 
Y; G and let Y/ G be the matrix containing all 

complementary data samples i.e. those that are not in class 
i. We denote by D/ G the dictionary of class i that is 

desired to be learned through our DFDL method. 

For a vector s G we denote by ||s||o the number of 
its non-zero elements. The sparsity constraint of s can be 
formulated as ||s||o < L. For a matrix S , ||S||o < L means 
that each column of S has no more than L non-zero elements. 



Figure 4: Samples form ADL dataset. First row: kidney. Second 
row: lung. Last row: spleen. Left: healthy. Right: inflammatory. 


where Li controls the sparsity level. These two sets of condi¬ 
tions could be simplified in the matrix form: 

1 2 

intra-class differences: — min IlY/ —D/Sflli small, (1) 

Ni ||S/||o<L 

inter-class differences: ^ min IlY/ —D^Silli large. (2) 


The averaging operations and ^ J are taken here for 

avoiding the case where the largeness of inter-class differences 
is solely resulting from Ni ^ Ni. 

For simplicity, from now on, we consider only one 
class and drop the class index in each notion, i.e., using 
Y,D,S,S,A^,./V,L instead of Yi,I>i,Si,^i,Ni,Ni and Li. Based 
on the argument above, we formulate the optimization problem 
for each dictionary: 


B. Discriminative Feature-oriented Dictionary Learning 

We aim to build class-specific dictionaries D/ such that 
each D/ can sparsely represent samples from class i but is 
poorly capable of representing its complementary samples 
with small number of bases. Concretely, for the learned 
dictionaries we need: 

min ||y/—D/S/II 2 , V/= 1,2,... to be small 

l|s/llo<L- 

and min ||y^-D,s^|| 2 , Vm = 1,2,... ,A^/ to be large. 

||Sm||o<L- 

^In our work, a training vector is obtained by vectorizing all three RGB 
channels followed by concatenating them together to have a long vector. 


D*=argmin(^ min ||Y-DS||f-^ min ||Y-DS||U, 
D VAf||S||o<L" ^l|S||o<L ^ 

(3) 

where p is a positive regularization parameter. The first term 
in the above optimization problem encourages intra-class 
differences to be small, while the second term, with minus 
sign, emphasizes inter-class differences. By solving the above 
problem, we can jointly find the appropriate dictionaries as we 
desire in 0 and 0. 

How to choose L: The sparsity level L for classes might be 
different. For one class, if L is too small, the dictionary might 
not appropriately express in-class samples, while if it is too 
large, the dictionary might be able to represent complementary 
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samples as well. In both cases, the classifier might fail to 
determine identity of one new test sample. We propose a 
method for estimating L as follows. First, a dictionary is 
learned using ODL p4| using in-class samples Y only: 

(D0,S«)=argmin{||Y-DS||2+^||S||i}, (4) 

where X is a. positive regularization parameter controlling the 
sparsity level. Note that the same X can still lead to different L 
for different classes, depending on the intra-class variablity of 
each class. Without prior knowledge of those variablities, we 
choose the same X for every class. After and have been 
computed, could be utilized as a warm initialization of D 
in our algorithm, could be used to estimate the sparsity 
level L: 

(5) 

i=l 

Classification scheme: In the same manner with SRC j^ , 
a new patch y is classified as follows. Firstly, the sparse codes 
s are calculated via /i-norm minimization: 

s = argmin {||y - +y||s|| i}, (6) 

where Dtotal = [DuD2,• • • ,Dc] is the collection of all dictio¬ 
naries and y is a scalar constant. Secondly, the identity of y is 
determined as: arg min {r/(y)} where 

^■(y) = ||y-D;8,-(8)112 (7) 

and 5i(s) is part of s associated with class i. 

C. Proposed solution 

We use an iterative method to find the optimal solution 
for the problem in Specifically, the process is iterative by 
fixing D while optimizing S and S and vice versa. 

In the sparse coding step, with fixed D, optimal sparse 
codes S*,S* can be found by solving: 

S*=arg min ||Y-DS||i; S* = arg min ||Y-DS||i. 
l|S||o<i- l|S||o<i. 

With the same dictionary D, these two sparse coding 
problems can be combined into the following one: 

S* =arg min ||Y-DS||y (8) 

l|S||o<i 

with Y = [Y,Y] being the matrix of all training samples 
and S = [S,S]. This sparse coding problem can be solved 
effectively by OMP p6| using SPAMS toolbox p7| . 

For the bases update stage, D* is found by solving: 

D*= argimn{i||Y-DS||2-P||Y-DS||2}, (9) 

= arg nnn { — 2trace (ED^) + trace (DFD^)}. (10) 

We have used the equation ||M||| = trace(MM^) for any 
matrix M to derive ( p^ from ([^ and denoted: 

E=-YS"^--E-YS"^; F= 2ss'^--E-SS"^. (11) 

N N N N 


Algorithm 1 Discriminative Feature-oriented Dictionary 
Learning 

function D* = DFDL(Y, Y,^, p) 

INPUT: Y,Y: collection of all in-class samples and 
complementary samples, k: number of bases in the dictio¬ 
nary. p: the regularization parameter. 

1. Choose initial D* and L as in 0 and 
while not converged do 

2. Fix D = D* and update S,S by solving 

3. Fix S,S, calculate: 

E=-YS^--0-YS^; F=-SS^--O-SS^. 

N N N N 

4. Update D from: 

D* = arg nnn I — 2trace(ED^) + trace ^D(F — ?qnin(F)l)U^^ | 

subject to:||d;j |2 = ip = 1 , 2 ,... ,^. 

end while 
RETURN: D* 
end function 


The objective function in is very similar to the 

objective function in the dictionary update stage problem in 
p4| except that it is not guaranteed to be convex. It is convex 
if and only if F is positive semidefinite. For the discrimi¬ 
native dictionary learning problem, the symmetric matrix F 
is not guaranteed to be positive semidefinite, even all of its 
eigenvalues are real. In the worst case, where F is negative 
semidefinite, the objective function in ( p^ becomes concave; 
if we apply the same dictionary update algorithm as in p4| , 
we will obtain its maximum solution instead of the minimum. 

To deal with this situation, we propose a technique which 
convexifies the objective function based on the following 
observation. 

If we look back to the main optimization problem stated 
in ^ : 

D* = argmin (2 min ||Y-DS||f-2 min ||Y-DS||fV 

® D VA^I|S||o<i Af||S||o<L" "V 


we can see that if 
solution, then D = 


p = [di 
4i 42 

ai a2 


d 2 ... dk] is an optimal 
^ is also an optimal 


solution as we multiply j-th rows of optimal S and S by 
Gj, where < 2^,7 = 1 , 2 ,... ,^, are arbitrary nonzero scalars. 
Consequently, we can introduce constraints: ||d;j |2 = 1,7 = 
1,2,... ,^, without affecting optimal value of ( p^ . With these 
constraints, trace (D?Lniin(F)I)^D^) = ?Lniin(F)trace(D^D) = 
d^d,- = Rniin(F), where Xmm(P) is the minimum 
eigenvalue of F and Ik denotes the identity matrix, is a 
constant. Substracting this constant from the objective function 
will not change the optimal solution to ( p^. E ssentially, the 
following problem in ( p^ is equivalent to (|l 0 |): 


D* = argmin{—2trace(ED^) + trace (D(F — ?Lniin(F)I^)D^)} 

( 12 ) 

subject to:||d;j |2 = ip = 1 , 2 ,... ,^. 
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Classification 
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Figure 5: IBL/ADL classification procedure 


The matrix F = F — ?iniin(F)Iy^ is guaranteed to be positive 
semidefinite since all of its eignenvalues now are nonnegative, 
and hence the objective function in ^V2) is convex. Now, this 
optimization problem is very similar to the dictionary update 
problem in p4| . Then, D* could be updated by the following 
iterations until convergence: 

«; ^ ^{ej-Dij) + dj. (13) 

d; ^ (14) 

hjh 

where Fjj is the value of F at coordinate ( 7 , 7 ) and fj denotes 
the 7 -th column of F. 

Our DFDL algorithm is summarized in Algorithm 

D. Overall classification procedures for three datasets 

In this section, we propose a DFDL-based procedure for 
classifying images in three datasets. 

1) IBL and ADL datasets 

The key idea in this procedure is that a healthy tissue image 
largely consists of healthy patches which cover a dominant 
portion of the tissue. This procedure is shown in Fig. and 
consists of the following three steps: 

Step 1: Training DFDL bases for each class. From labeled 
training images, training patches are randomly extracted (they 
might be overlapping). The size of these patches is picked 
based on pathologist input and/or chosen by cross validation 
| [38| . After we have a set of healthy patches and a set of 
diseased patches for training, class-specific DFDL dictionaries 
and the associated classifier are trained by using Algorithmic 
Step 2: Learning a threshold 0 for proportion of healthy 
patches in one healthy image. Labeled training images are now 


divided into non-overlapping patches. Each of these patches 
is then classified using the DFDL classifier as described in 
Eq. ^ and 0 - The main purpose of this step is to find 
the threshold 0 such that healthy images have proportion of 
healthy patches greater or equal to 0 and diseased ones have 
proportion of diseased patches less than 0. We can consider 
the proportion of healthy patches in one training image as its 
one-dimension feature. This feature is then put into a simple 
SVM to learn the threshold 0. 

Step 3: Classifying test images. For an unseen test image, 
we calculate the proportion x of healthy patches in the same 
way described in Step 2. Now, the identity of the image 
is determined by comparing the proportion x to 0. It is 
categorized as healthy (diseased) if x > (<)0. The procedure 
readily generalizes to multi-class problems. 

2) MVP detection problem in TCGA dataset 

As described earlier. Micro Vascular Proliferation (MVP) is 
the presence of blood vessels in a tissue and it is an important 
indicator of a high-grade tumor in brain glioma. Essentially 
presence of one such region in the tissue image indicates 
the high-grade tumor. Detection of such regions in TCGA 
dataset is an inherently hard problem and unlike classifying 
images in IBL and ADL datasets which are distinguishable 
by researching small regions, it requires more effort and 
investigation on larger connected regions. This is due to the 
fact that an MVP region may significantly vary in size and is 
usually surrounded by tumor cells which are actually benign 
or low grade. In addition, an MVP region is characterized by 
the presence of enlarged vessels in the tissue with different 
color shading and thick layers of cell rings inside the vessel 
(see Pig. 0 - We define a patch as MVP if it lies entirely within 
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Training (Dictionary 
Learning) phase 


Labeled training images 



INPUT: Labelled training images 
OUTPUT: DFDL dictionaries for 
Not MVP and MVP 


DFDL for patches extracted 
from ROIs (same strategy as 
training phase in IBL data set) 




Figure 6: MVP detection procedure 


an MVP region and as Not MVP otherwise. We also define a 
region as Not MVP if it does not contain any MVP patch. The 
procedure consists of two steps: 

Step 1: Training phase. From training data, MVP regions and 
Not MVP regions are manually extracted. Note that while 
MVP regions come from MVP images only, Not MVP regions 
might appear in all images. From these extracted regions, 
DFDL dictionaries are obtained in the same way as in step 
1 of IBL/ADL classification procedure described in section 
lTDT] and Fig.|g 

Step 2: MVP detection phase: A new unknown image is 
decomposed into non-overlapping patches. These patches are 
then classified using DFDL model learned before. After this 
step, we have a collection of patches classified as MVP. 
A region with large number of connected classified-as-MVP 
patches could be considered as an MVP region. If the final 
image does not contain any MVP region, we categorize the 
image as a Not MVP; otherwise, it is classified as MVP. 
The definition of connected regions contains a parameter m, 
which is the number of connected patches. Depending on m, 
positive patches might or might not appear in the final step. 
Specifically, if m is small, false positives tend to be determined 
as MVP patches; if m is large, true positives are highly likely 
eliminated. To determine m, we vary it from 1 to 20 and 
compute its ROC curve for training images and then simply 
pick the point which is closest to the origin and find the 
optimal m. This procedure is visualized in Fig. 


III. VALIDATION AND EXPERIMENTAL RESULTS 

In this section, we present the experimental results of ap¬ 
plying DFDL to three diverse histopathological image datasets 
and compare our results with different competing methods: 

• WND-CHARM in conjunction with SVM: 

this method combines state-of-the-art feature extraction and 
classification methods. We use the collection of features from 
WND-CHARM, which is known to be a powerful toolkit 
of features for medical images. While the original paper 
used weighted nearest neighbor as a classifier, we use a 
more powerful classifier (SVM p9| ) to further enhance clas¬ 
sification accuracy. We pick the most relevant features for 
histopathology Q, including but not limited to (color channel- 
wise) histogram information, image statistics, morphological 
features and wavelet coefficients from each color channel. The 
source code for WND-CHARM is made available by the Na¬ 
tional Institute of Health online at http://ome.grc.nia.nih.gov/ 

• SRC | [24| : We apply SRC on the vectorization of the lu¬ 
minance channel of the histopathological images, as proposed 
initially for face recognition and applied widely thereafter. 

• SHIRC Srinivas et al. 0 0 presented a simulta¬ 
neous sparsity model for multi-channel histopathology image 
representation and classification which extends the standard 
SRC p4| approach by designing three color dictionaries 
corresponding to the RGB channels. The MATLAB code for 
the algorithms is posted online at: http://signal.ee.psu.edu/ 
histimg.html 

• LC-KSVD 12^ and FDDL 1^: These are two well- 
known dictionary learning methods which were applied to 
object recognition such as face, digit, gender, vehicle, ani¬ 
mal, etc, but to our knowledge, have not been applied to 
histopathological image classification. To obtain a fair com¬ 
parison, dictionaries are learned on the same training patches. 
Classification is then carried out using the learned dictionaries 
on non-overlapping patches in the same way described in 
Section III-DI 

• Nayak’s: In recent relevant work, Nayak et al. 0 
proposed a patch-based method to solve the problem of clas¬ 
sification of tumor histopathology via sparse feature learning. 
The feature vectors are then fed into SVM to find the class 
label of each patch. 

A. Experimental Set-Up: Image Datasets 

IBL dataset: Each image contains a number of regions 
of interest (Rols), and we have chosen a total of 120 images 
(Rols), consisting of a randomly selected set of 20 images 
for training and the remaining 100 Rols for test. Images are 
downsampled for computational purposes such that size of a 
cell is around 20-by-20 (pixels). Examples of images from 


below are conducted with 10 training images per class, 10000 
patches of size 20-by-20 for training per class, ^ = 500 bases 
for each dictionary, X = 0.1 and p = 0.001. These parameters 
are chosen using cross-validation | [38| . 

ADL dataset: This dataset contains bovine histopathology 
images from three sub-datasets of kidney, lung and spleen. 
Each sub-dataset consists of images of size 4000 x 3000 pixels 


this dataset are shown in Eig. Experiments in section |III-B 
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Figure 7: Example bases learned from different methods on different datasets. DFDL, LC-KSVD p^ , FDDL |[^ in IBL and ADL datasets. 


from two classes: healthy and inflammatory. Each class has 
around 150 images from which 40 images are chosen for 
training, the remaining ones are used for testing. Number of 
training patches, bases, X and p are the same as in the IBL 
dataset. The classification procedure for IBL and ADL datasets 
is described in Section III-Dll 

TCGA dataset: We use a total of 190 images (Rols) 
(resolution 3000 x 3000) from the TCGA, in which 57 images 
contain MVP regions and 133 ones have no traces of MVP. 
From each class, 20 images are randomly selected for training. 
The classification procedure for this dataset is described in 
Section III-D2I 

Each tissue specimen in these datasets is fixed on a scan¬ 
ning bed and digitized using a digitizer at 40 x magnification. 

B. Validation of Central Idea: Visualization of Discovered 

Features 

This section provides experimental validation of the central 
hypothesis of this paper: by imposing sparsity constraint on 
forcing intra-class differences to be small, while simultane¬ 
ously emphasizing inter-class differences, the class-specific 
bases obtained are discriminative. 

Example bases obtained by different dictionary learning 
methods are visualized in Eig. [7] By visualizing these bases, 
we emphasize that our DEDL is able to look for discriminative 
visual features from which pathologists could understand the 
reasons behind diseases. In the spleen dataset for example, 
it is really difficult to realize the differences between two 
classes by human eyes. However, by looking at DEDL learned 
bases, we can see that the distribution of cells in two classes 
are different such that a larger number of cells appears in a 
normal patch. These differences may provide pathologists one 
visual cue to classify these images without advanced tools. 
Moreover, for IBL dataset, UDH bases visualize elongated 
cells with sharp edges while DCIS bases present more rounded 



Eigure 8: Example of sparse codes using DFDL and LC-KSVD 
approaches on lung dataset. Left: normal lung (class 1). Right: 
inflammatory lung (class 2). Row 1: test images. Row 2: Sparse codes 
visualization using DFDL. Row 3: Sparse codes visualization using 
LC-KSVD. X axis indicates the dimensions of sparse codes with codes 
on the left of red lines corresponding to bases of class 1, those on the 
right are in class 2. y axis demonstrates values of those codes. In one 
vertical line, different dots represent values of non-zeros coefficients 
of different patches. 


cells with blurry boundaries, which is consistent with their 
descriptions in Q and 0; for ADL-Lung, we observe that 
a healthy lung is characterized by large clear openings of the 
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alveoli, while in the inflamed lung, the alveoli are fllled with 
bluish-purple inflammatory cells. This distinction is very clear 
in the bases learned from DFDL where white regions appear 
more in normal bases than in inflammatory bases and no such 
information can be deduced from LC-KSVD or FDDL bases. 
In comparison, FDDL fails to discover discriminative visual 
features that are interpretable and LC-KSVD learns bases with 
the inter-class differences being less signiflcant than DFDL 
bases. Furthermore, these LC-KSVD bases do not present key 
properties of each class, especially in lung dataset. 

To understand more about the signiflcance of discrimina¬ 
tive bases for classiflcation, let us first go back to SRC p4| . 
For simplicity, let us consider a problem with two classes 
with corresponding dictionaries Di and D 2 . The identity of 
a new patch y, which, for instance, comes from class 1 , is 
determined by equations ([^ and (|7]). In order to obtain good 
results, we expect most of active coefficients to be present 
in 5i(s). For 82 ( 8 ), its non-zeros, if they exists should have 
small magnitude. Now, suppose that one basis, di, in Di looks 
very similar to another basis, d 2 , in D 2 . When doing sparse 
coding, if one patch in class 1 uses di for reconstruction, it is 
highly likely that a similar patch y in the same class uses d 2 
for reconstruction instead. This misusage may lead to the case 
Ily-Di5i(§)ll > ||y — D 252 (s)||, resulting in a misclassified 
patch. For this reason, the more discriminative bases are, the 
better the performance. 

To formally verify this argument, we do one experiment 
on one normal and one inflammatory image from lung dataset 
in which the differences of DFDL bases and LCKSVD bases 
are most significant. From these images, patches are extracted, 
then their sparse codes are calculated using two dictionaries 
formed by DFDL bases and LC-KSVD bases. Fig. [^demon¬ 
strates our results. Note that the plots in Figs. [^) and d) are 
corresponding to DFDL while those in Figs.j^) and f) are for 
LC-KSVD. Most of active coefficients in Fig.[^) are gathered 
on the left of the red line, and their values are also greater than 
values on the right. This means that Di contributes more to 
reconstructing the lung-normal image in Fig.[^) than D 2 does. 
Similarly, most of active coefficients in Fig.j^) locate on the 
right of the vertical line. This agrees with what we expect since 
the image in Fig. [^) belongs to class 1 and the one in Fig. 
[^) belongs to class 2. On the contrary, for LC-KSVD, active 
coefficients in Fig. are more uniformly distributed on both 
sides of the red line, which adversely affects classification. In 
Fig. [^), although active coefficients are strongly concentrated 
to the left of the red line, this effect is even more pronounced 
with DFDL, i.e. in Fig. [^). 

C. Overall Classification Accuracy 

To verify the performance of our idea, for IBL and 
ADL datasets, we present overall classification accuracies in 
the form of bar graphs in Fig. E It is evident that DFDL 
outperforms other methods in both datasets. Specifically, in 
IBL and ADL Lung, the overall classification accuracies of 
DFDL are over 97.75%, the next best rates come from WND- 
CHARM (92.85% in IBL) and FDDL (91.56% in ADL-Lung), 
respectively, and much higher than those reported in and 
our own previous results in . In addition, for ADL-Kidney 



Images are classified in whole image level. 

Figure 9: Bar graphs indicating the overall classification accuracies 
(%) of the competing methods. 



TCGA 


ROC curves - TCGA 



WND-CHARM LC-KSVD Nayak’s et al. — FDDL — DFDL 


Figure 10: Bar graphs (left) indicating the overall classification 
accuracies (%) and the receiver operating characteristic (right) of the 
competing methods for TCGA dataset. 


and ADL-Spleen, our DFDL also provides the best result with 
accuracy rates being nearly 90% and over 92%, respectively. 


For the TCGA dataset, overall accuracy of competing 
methods are shown in Fig. 10 which reveals that DFDL 


performance is the second best, bettered only by LC-KSVD 
and by less than 0.67% (i.e. one more misclassified image for 
DFDL). 


D. Complexity analysis 

In this section, we compare the computational complexity 
for the proposed DFDL and competing dictionary learning 
methods: LC-KSVD FDDL and Nayak’s 0. The 
complexity for each dictionary learning method is estimated 
as the (approximate) number of operations required by each 
method in learning the dictionary (see Appendix for details). 
From Table |I^ it is clear that the proposed DFDL is the least 
expensive computationally. Note further, that the final column 
of Table lU shows actual run times of each of the methods. 
The parameters were as follows: c = 2 (classes), k = 500 
(bases per class), N = 10,000 (training patches per class), 
data dimension d = 1200 (3 channels x20 x 20), sparsity level 
L = 30. The run time numbers in the final column of Table 
[n| are in fact consistent with numbers provided in Table |I^ 
which are calculated by plugging the above parameters into 
the second column of Table mi 
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Table IV: CONFUSION MATRIX: ADL (%). 



Kidney 

Lung 

Spleen 


Class 

Health 

inflammatory 

Health 

inflammatory 

Health 

inflammatory 

Method 


83.27 

16.73 

83.20 

16.80 

^123 

12.77 

WND-CHARM^*) 

SRC^*) 0 


87.50 

12.50 

72.50 

27.50 

70.83 

29.17 


82.50 

17.50 

75.00 

25.00 

65.00 

35.00 

SHIRC 

Health 

83.26 

16.74 

93.15 

6.85 

86.94 

13.06 

FDDL CT 

86.84 

13.16 

85.59 

15.41 

89.75 

10.25 

LC-KSW(2^ 
Nayak’s et al. Q 


73.08 

26.92 

89.55 

10.45 

86.44 

13.56 


88.21 

11.79 

96.52 

3.48 

92.88 

7.12 

DFDL 


14.22 

85.78 

14.31 

83.69 

10.48 

89.52 

WND-CHARM^*) (hJ 


25.00 

75.00 

24.17 

75.83 

20.83 

79.17 

SRC^*) 


16.67 

83.33 

15.00 

85.00 

11.67 

88.33 

SHIRC^ 

inflammatory 

19.88 

80.12 

10.00 

90.00 

8.57 

91.43 

FDDL ra 

19.25 

81.75 

10.89 

89.11 

8.57 

91.43 

LC-KSVTT [2^ 
Nayak’s et al. 


26.92 

73.08 

25.90 

74.10 

6.05 

93.95 


9.92 

90.02 

2.57 

97.43 

7.89 

92.01 

DFDL 


Images are classified in whole image level. 


Table I: CONFUSION MATRIX: IBL. 


Class 

UDH 

DCIS 

Method 

UDH 

91.75 

68.00 

93.33 

84.80 

90.29 

85.71 

96.00 

8.25 

32.00 

6.67 

15.20 

9.71 

14.29 

4.00 

WND-CHARM^*) 

SRC(*) ^ 

SHIRC ^ 

FDDL (g 
LC-KSVTJ(2^ 
Nayak’s et al. 

DFDL 

DCIS 

5.77 

44.00 

10.00 

10.00 

14.86 

23.43 

0.50 

94.23 

56.00 

90.00 

90.00 

85.14 

76.57 

99.50 

WND-CHARM^*) (ijJ 

SRC(*) 0 

SHIRC^ 

FDDL 

LC-KSW(2^ 
Nayak’s et aT~^ 

DFDL 


Images are classified in whole image level. 


Table II: Complexity analysis for different dictionary learning 
methods. 


Method 

Complexity 

Running time 

DFDL 

c^kN{2d + L^) 

~ 0.5 hours 

LC-KSVD |29| 

c^kN{2d + 2ck + L^) 

~ 3 hours 

Nayak’s et al. 

c^kN{2d 2qck) + c^dld 

~ 8 hours 

FDDL (3l|^*) 

c^kN{2d + 2qck) + c^dk^ 

> 40 hours 


is the number of iterations required for /i-minimization in 
sparse coding step. 


E. Statistical Results: Confusion Matrices and ROC Curves 
Next, we present a more elaborate interpretation of classifi¬ 
cation performance in the form of confusion matrices and ROC 
curves. Each row of a confusion matrix refers to the actual 
class identity of test images and each column indicates the 
classifier output. Table |I W and [V| show the mean confusion 
matrices for all of three dataset. In continuation of trends 
from Fig. in Table |IV| DFDL offers the best disease 
detection accuracy in almost all datasets for each organ, while 
maintaining high classification accuracy for healthy images. 


Typically in medical image classification problems, pathol¬ 
ogists desire algorithms that reduce the probability of miss 
(diseased images are misclassified as healthy ones) while 
also ensuring that the false alarm rate remains low. However, 
there is a trade-off between these two quantities, conveniently 
described using receiver operating characteristic (ROC) curves. 


Table III: Estimated number of operations required in different 
dictionary learning methods. 


Method 

q^l 

II 

O 

II 

DFDL 

6.6 X lO'” 

6.6 X 10^^ 

6.6 X 10^^ 

LC-KSVD 


1.06 X 10“ 

1.06 X 10“ 

1.06 X 10“ 

Nayak’s et al. 

ll 

8.92 X 10*'^ 

1.692 X 10" 

4.492 X 10“ 

FDDL 


9.04 X 10'“ 

1.704 X 10“ 

4.504 X 10“ 


Table V: CONEUSION MATRIX: TCGA (%). 


Class 

Not MVP 

MVP 

Method 


76.68 

23.32 

WND-CHARM 

Not VMP 

92.92 

7.08 

Nayak’s et al. ][^ * 

11 vju y iviA 

96.46 

3.54 

LC-KSVD (2^ 


92.04 

7.96 

FDDL 


94.69 

5.31 

DFDL 


21.62 

78.38 

WND-CHARM inj 

MVP 

16.22 

83.78 

Nayak’s et al. ][^ * 


8.10 

91.90 

LC-KSVD (29J 


18.92 

81.08 

FDDL 


5.41 

94.59 

DFDL 


Eig. and Eig. (right) show the ROC curves for all three 
datasets. The lowest curve (closest to the origin) has the best 
overall performance and the optimal operating point minimizes 
the sum of the miss and false alarm probabilities. It is evident 
that ROC curves for DEDL perform best in comparison to 
those of other state-of-the-art methods. 


Remark: Note for ROC comparisons, we compare the 
different flavors of dictionary learning methods (the proposed 
DEDL, LC-KSVD, EDDL and Nayak’s), this is because as 
Table |V] shows, they are the most competitive methods. Note 
for the IBL and ADL datasets, 0, as defined in Eig. is 
changed from 0 to 1 to acquire the curves; whereas for 
the TCGA dataset, number of connected classified-as-MVP 
patches, m, is changed from 1 to 20 to obtain the curves. It is 
worth re-emphasizing that DEDL achieves these results even 
as its complexity is lower than competing methods. 


F. Performance vv. size of training set 

Real-world histopathological classification tasks must of¬ 
ten contend with lack of availability of large training sets. To 
understand training dependence of the various techniques, we 
present a comparison of overall classification accuracy as a 
function of the training set size for the different methods. We 
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Probability of false alarm Probability of false alarm 

LCKSVD Nayak’s et al. FDDL — DFDL 


Spleen 



Probability of false alarm 


Figure 11: Receiver operating characteristic (ROC) curves for different organs, methods, and datasets (IBL and ADL). 


IBL Kidney Lung Spleen 



Figure 12: Overall classification accuracy (%) as a function of training set size per class. Top row: number of training patches. 
Bottom row: number of training images. 



LCKSVD NANDITA — FDDL — DFDL 


Figure 13: Overall classification accuracy (%) as a function of number of training bases. 


also present a comparison of classification rates as a function 
of the number of trainin g pa tches for different dictionary 
learning method^ In Fig. 12 overall classification accuracy 
is reported for IBL and ADL datasets corresponding to five 
scenarios. It is readily apparent that DFDL exhibits the most 
graceful decline as training is reduced. 


Since WND-CHARM is applied in the whole image level, there is no 
result for it in comparison of training patches. 


G. Performance vv. number of training bases 

We now compare the behavior of each dictionary learning 
method as the number of bases in each dictionary varies 
from 200 to 600 (with patch size being fixed at 20 x 20 
pixels). Results reported in Fig, confirm that DFDL again 
outperforms other methods. In general, overall accuracies of 
DFDL on different datasets remain high when we reduce 
number of training bases. Interpreted another way, these results 
illustrate that DFDL is fairly robust to changes in parameters, 
which is a highly desirable trait in practice. 
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IV. Discussion and Conclusion 

In this paper, we address the histopathological image 
classification problem from a feature discovery and dictionary 
learning standpoint. This is a very important and challenging 
problem and the main challenge comes from the geometri¬ 
cal richness of tissue images, resulting in the difficulty of 
obtaining reliable discriminative features for classification. 
Therefore, developing a framework capable of capturing this 
structural richness and being able to discriminate between 
different types is investigated and to this end, we propose 
the DFDL method which learns discriminative features for 
histopathology images. Our work aims to produce a more 
versatile histopathological image classification system through 
the design of discriminative, class-specific dictionaries which 
is hence capable of automatic feature discovery using example 
training image samples. 

Our DFDL algorithm learns these dictionaries by lever¬ 
aging the idea of sparse representation of in-class and out- 
of-class samples. This idea leads to an optimization problem 
which encourages intra-class similarities and emphasizes the 
inter-class differences. Ultimately, the optimization in ( p^ is 
done by solving the proposed equivalent optimization problem 
using a convexifying trick. Similar to other dictionary learning 
(machine learning approaches in general), DFDL also requires 
a set of regularization parameters. Our DFDL requires only 
one parameter, p, in its training process which is chosen by 
cross validation | [3^ - plugging different sets of parameters 
into the problem and selecting one which gives the best 
performance on the validation set. In the context of application 
of DFDL to real-world histopathological image slides, there 
are quite a few other settings should be carefully chosen, such 
as patch size, tiling method, number of connected components 
in the MVP detection etc. Of more importance is the patch size 
to be picked for each dataset which is mostly determined by 
consultation with the medical expert in the specific problem 
under investigation and the type of features that we should be 
looking for. For simplicity we employ regular tiling; however, 
using prior domain knowledge this may be improved. For 
instance in the context of MVP detection, informed selection 
of patch locations using existing disease detection and local¬ 
ization methods such as can be used to further improve 
the detection of disease. 

Experiments are carried out on three diverse histopatholog¬ 
ical datasets to show the broad applicability of the proposed 
DFDL method. It is illustrated our method is competitive with 
or outperforms state of the art alternatives, particularly in the 
regime of realistic or limited training set size. It is also shown 
that with minimal parameter tuning and algorithmic changes, 
DFDL method can be easily applied on different problems 
with different natures which makes it a good candidate for 
automated medical diagnosis instead of using customized and 
problem specific frameworks for every single diagnosis task. 
We also make a software toolbox available to help deploy 
DFDL widely as a diagnostic tool in existing histopathological 
image analysis systems. Particular problems such as grading 
and detecting specific regions in histopathology may be inves¬ 
tigated using our proposed techniques. 


Appendix 

COMPLEXITY ANALYSIS 

In this section, we compare the computational complexity 
for the proposed DFDL and competing dictionary learning 
methods: LC-KSVD FDDL and Nayak’s 0. The 
complexity for each dictionary learning method is estimated 
as the (approximate) number of operations required by each 
method in learning the dictionary. For simplicity, we assume 
that number of training samples, number of dictionary bases in 
each class are the same, which means: Nt = Nj = N^kt = kj = 
V/, 7 = 1,2,..., c, and also Li = Ly = L, V/, 7 = 1,2,..., c. For 
the consitence, we have changed notations in those methods 
by denoting Y as training samples and S as the sparse code. 

In most of dictionary learning methods, the complexity 
of sparse coding step, which is often a /q or l\ minimization 
problem, dominates that of dictionary update step, which is 
typically solved by either block coordinate descent p4| or 
singular value decomposition p^ . Then, in order to compare 
the complexity of different dictionary learning methods, we 
focus on comparing the complexity of sparse coding steps in 
each iteration. 


A. Complexity of the DFDL 

The most expensive computation in DFDL is solving an 
Orthogonal Matching Pursuit (OMP p6| ) problem. Given a 
set of samples Y G a dictionary D G and sparsity 

level L, the OMP problem is: 

S*=arg min ||Y-DS||i. 

l|S|lo<i 

R. Rubinstein et al. | [40| reported the complexity of Batch- 
OMP when the dictionary is stored in memory in its entirety 
as: 7b_omp = N{2dk -f L^k + 3Lk + L^) + dk^. Assuming an 
asymptotic behavior of L<^k^d<^N, the above expression 
can be simplified to: 

Tb.omp^N{2dk + L^k) = kN{2d + L^). (15) 


This result will also be utilized in analyzing complexity of 
LC-KSVD. 

The sparse coding step in our DFDL consists of solving 
c sparse coding problems: S = argmin||s||Q<^ ||Y — D/S;||^ . 
With Y G G each problem has complexity of 

k{cN){2d-\-L^). Then the total complexity of these c problems 
is: Tj)FDL-c^kN{2dFL^). 

B. Complexity of LC-KSVD 

We consider LC-KSVD 1 only (LC-KSVD2 has a higher 
complexity) whose optimization problem is written as p9| : 

(D,A,S)=argmin ||Y-DS||2+a||Q-AS||2 s.t. ||si||o<L. 
D,A,S 


and it is rewritten in the K-SVD form: 


(D, A.S) = arg min 
D,A,S 


Y 

VaQ 


D 

i/aA 


s.t. ||S;||o<Z.. 


Since Q G 
and D = 


D 

V«A 


and A e Y = 

G Neglecting the computation of 


Y 

\/aQ 


0 

(16) 
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scalar multiplications, the complexity of ( p^ is: 

Tlc-ksvd - {ck) (cN){2{d + c^) + L^) = c'^kN{2d + 2c^ + L^). 

C. Complexity of Nayak's 

The optimization problem in Nayak’s Q is: 

(D,S,W) = arg imn ||Y-DS||2+^||S||i + ||S-WY||^ 

D,S,W 

S is estimated via the gradient descent method that is an itera¬ 
tive method whose main computational task in each iteration is 
to calculate the gradient of 2(8) = ||Y-DS||f + ||S-WY||f 
with respect to S. We have: 

= 2 ((D^D + I)S - (D^ - W)y) . 

where D^D + I, and (D^ — W)Y could be precomputed and at 
each step, only (D^D + I)S need to be recalculated after S is 
updated. With D G G Y G G 

the complexity of the sparse coding step can be estimated as: 


where D^D, and D^Y^ could be precomputed with the total 
cost of {ck)d{ck) -\-2{ck)dN = cdk{2N^ck); D^, and D^Y^- 
could be extracted from D^D, and D Yf at no cost; at each 
iteration, cost of computing (D^D)S; is 2{ck)^N, each of 
(DjDyOS/ could be attained in the intermediate step of com¬ 
puting (D^D)Si. Therefore, with q iterations, the computa¬ 


tional cost of ( p^ is: 

cdk{2N + ck) + 2qc^k^N. (20) 

For the last three terms, we will prove that: 

^||S,-M,||2 = 2(8;-M,), (21) 

^f||M,-M||2 = 2(M,--M), (22) 

k=l 

^ii||8,||2 = 2ti8,-. (23) 


kayak’s 


{ck)d{ck) -f 2{ck)d{cN) + 2q{ck)^cN, (17) Indeed, let be a all-one matrix in one could easily 

c^kN{2d + 2qck) + c^dk^. (18) 


with q being the average number of iterations needed for 
convergence. Here we have ignored matrix subtractions, ad¬ 
ditions and scalar multiplications and focused on matrix mul¬ 
tiplications only. We have also used the approximation that 
complexity of AB is 2mnp where A G G The 

first term in is of D^D +1 (note that this matrix is 
symmetric, then it needs only half of regular operations), the 
second term is of (D^ — W)Y and the last one comes from q 
times complexity of calculating (D^D + I)S. 

D. Complexity of FDDL 

The sparse coding step in FDDL | [3T| requires solving c 
class-specific problems: 

8,- = argmin| ||Y,- -D8,-||2 + ||Y,- -D,-8;:||^ + f ||D,-8/ 
+^2{||S,--M,-||2-f ||M,-M||2 +ti||8,-||2} + ^i||8,-||i|, 

k=\ ^ 

with D = [Di,...,D,],Sy = [(Si)T,...,(sn^], and 
Mk = [niy^, ... ,my^] G = [m, ...,m] G where 

iTiy^ and m are the mean vector of Si and S = [Si,...,Sc] 
respectively. The algorithm for solving this problem uses 
Iterative Projective Method ED whose complexity depends 
on computing gradient of six Frobineous-involved terms in 
the above optimization problem at each iteration. 

For the first three terms, the gradient could be computed as: 

r 2 (DTDi)8) 1 


1 1 1 ^ 

Ylk = ■^^k^N,N'^ M = -^SEcN,N = 


cN 


cN 


i=i 


Em,nEn,p = nEmy, {I - —En,n){^- —En,n)'^ = (1“ —Ea^,A^). 
Thus, can be obtained by: 

^||S; —M/lll = ~ 

= ^IISKI- = 28,-(I- 2e^,^)(I- 

= 28Kl-^Ew.^) = 2 ( 8 ,--M;). 


For (22), with simple algebra, we can prove that: 


dS/ cN c 


„2 2 .. 2 , 


^\\Mk - M\\j = ^(M - Mi)E^,;v = -{M-Mk),{k^ i). 
oSi cN c 


Compared to calculating and ( [ 2 ^ require 

much less computation. As a result, the total cost of solving 
Si approximately equals to (^); and the total estimated cost 
of sparse coding step of FDDL is estimated as c times cost of 
each class-specific problem and approximately equals to: 


2 (DTD) 8 ,- - 2 DTY; 


2(dTd08;:-dTy,- 


2(DjD,)8^ 


(19) 2fddl ^ c^dk{2N+ck)+2qc^C'N = c^kN{2d + 2qck)+ c^dC'. 

Final analyzed results of four different dictionary learning 
methods are reported in Table 
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